Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⚡ Tokenizer Optimization
Specific
SIMD Processing, State Machines, Unicode Handling, Performance
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
160095
posts in
25.3
ms
Accelerate CPU-based AI inference workloads using Intel
AMX
on Amazon
EC2
🗺️
Region Inference
aws.amazon.com
·
3d
·
…
Beating Python’s GIL: Achieving a 130x
Speedup
in Batch Processing with Rust and
Rayon
🦀
MIR Optimization
medium.com
·
2d
·
…
Building
CompilerSutra
🎓
Teaching Compilers
docs.google.com
·
20h
·
DEV
·
…
OmniVoice
, high-quality
TTS
for 600+ Languages
🔄
Incremental Lexing
zhu-han.github.io
·
11h
·
Hacker News
·
…
Metal Quantized Attention: pulling M5 Max ahead with
Int8
matrix
multiplication
🗺️
Region Inference
releases.drawthings.ai
·
1d
·
Hacker News
·
…
Speculative
Decoding: Performance or
Illusion
?
🗺️
Region Inference
specdecode-bench.github.io
·
6d
·
Hacker News
·
…
How we chose
Positron
’s Python type
checker
✅
Type Checking
positron.posit.co
·
2d
·
Hacker News
·
…
General
scales
unlock AI evaluation with
explanatory
and predictive power
🪜
Recursive Descent
nature.com
·
1d
·
…
Context
Rot
: How
Increasing
Input Tokens Impacts LLM Performance
🔍
Tokenizers
trychroma.com
·
6d
·
DEV
·
…
Donald
Raab
: Measuring the Startup Memory Cost for Lazy
Iteration
Patterns in Java
🗑️
Garbage Collection
donraab.medium.com
·
2d
·
…
APL
Performance
🔀
SIMD Programming
aplwiki.com
·
3d
·
Hacker News
·
…
Intel Delivers Open, Scalable AI Performance in
MLPerf
Inference
v6.0
🗺️
Region Inference
newsroom.intel.com
·
1d
·
…
Supercharging
Redpanda
Streaming with profile-guided optimization
📈
Performance Tools
redpanda.com
·
1d
·
…
yash27-lab/batch
_forge: A high-performance, bare-metal inference engine for JAX and Equinox models written in Rust. Features zero-copy
Safetensors
loading and hand-optimized Metal/Vulkan compute kernels for Transformers, Vision Language Models, and State-Space Models
🗺️
Region Inference
github.com
·
3d
·
Hacker News
·
…
Iteratively
optimizing an
SPSC
queue
🎯
Ring Buffers
blog.c21-mac.com
·
4d
·
r/cpp
·
…
MXFP8
GEMM: Up to 99% of
cuBLAS
Performance Using CUDA and PTX
🔬
Nanopasses
danielvegamyhre.github.io
·
5d
·
Hacker News
·
…
Scaling AI
Workloads
in Java Without Breaking Your
APIs
⚡
Interpreter Optimization
dzone.com
·
6d
·
…
Discord Engineers Add Distributed
Tracing
to
Elixir
's Actor Model Without Performance Penalty
✨
Gleam
infoq.com
·
5d
·
…
Systematic
Analysis of CPU-Induced
Slowdowns
in Multi-GPU LLM Inference (Georgia Tech)
🗺️
Region Inference
semiengineering.com
·
6d
·
…
Designing High-Concurrency
Databricks
Workloads Without Performance
Degradation
🗑️
Concurrent GC
dzone.com
·
6d
·
…
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help